broader impact section
We will improve the broader impact section by emphasizing the implications of our theoretical
We sincerely thank all the reviewers, and feel really honored to receive such positive and constructive comments. We will mention total variation distance in the appendix, and correct the typo on "Corollary Note that the smooth planning oracle is not needed throughout the paper, and is thus not the "primary It is only used in Sec. We have discussed R-MAX in lines 82-83. By saying "especially model-free ones..." this sentence, we simply meant The works on Q-learning in games you mentioned exactly conquered this issue, with non-trivial efforts. We will address all the grammatical comments/typos in the final version.